Goto

Collaborating Authors

 multiple frequency


Reinforcement Learning for Control with Multiple Frequencies

Neural Information Processing Systems

Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efficiently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of $|A|$ increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs.


Review for NeurIPS paper: Reinforcement Learning for Control with Multiple Frequencies

Neural Information Processing Systems

Summary and Contributions: This work introduces an algorithm for reinforcement learning in settings with factored action spaces in which each element of the action space may have a different control frequency. To motivate the necessity of such an algorithm, it provides an argument that in this setting, a naive approach with a stationary Markovian policy on the states (which does not observe the timestep) can be suboptimal. Further, it argues that simply augmenting the state or action spaces and applying standard RL methods results in costs which are exponential in L, the least common multiple of the set of action persistences. In constructing the method this paper introduces c-persistent Bellman operators, a way of updating a Q-function in an environment with multiple action persistences, and proves its convergence. This leads to a method which uses L Q-functions, one for each step in the periodic structure of action persistences.


Review for NeurIPS paper: Reinforcement Learning for Control with Multiple Frequencies

Neural Information Processing Systems

The paper proposes an off-policy policy iteration scheme for factored action spaces where different actions (action dimensions) are persistent with different frequencies. The reviewers agree that the proposed approach is sound, novel, and well motivated. The paper is well written. There is some disagreement how broad the range of applications is to which the proposed method can be applied and what this means for the impact of the paper (R5); some concerns regarding the scalability (R5) of the approach; and some desire for environments not designed by the authors (R2). The AC believes that, although the application domain may be somewhat niche, and the proposed method the result of a somewhat straightforward reasoning about basic properties of MDPs (I don't mean this in a bad way; such basic ideas are often overlooked), on balance the paper will be useful and of interest to the community.


Reinforcement Learning for Control with Multiple Frequencies

Neural Information Processing Systems

Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efficiently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of A increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs. In the experiments, we demonstrate that AP-AC significantly outperforms the baselines on several continuous control tasks and a traffic control simulation, which highlights the effectiveness of our method that directly optimizes the periodic non-stationary policy for tasks with multiple control frequencies.


How To Talk To Plants Using Machine Learning And Gesture Recognition

#artificialintelligence

Botanicus Interacticus is a new interactive plant technology which does not require any new instrumentation in plants. A simple electrode placed inside the soil is able to grasp a ton of frequencies produced by the plant, converting it into a multi-touch gesture sensitive controller. Touché is a project developed at Disney Research which makes use of frequencies captured by sensing various events witnessed by the plant and simultaneously recognises complex human physical interactions with it. In simple words, it has the capability to express what kind of touch event has occurred -- caressing, pinching, holding, tickling, etc. Traditional capacitive sensors work by generating an electrical signal at a single frequency. This frequency is applied onto a conductive surface, such as a metal. The value of the capacitance changes if the hand is close enough to the surface of the plant or if it is in contact.